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Abstract 

The Artificial Prediction Market is a recent machine learning tech- 
nique for multi-class classification, inspired from the financial markets. It 
involves a number of trained market participants that bet on the possible 
outcomes and are rewarded if they predict correctly. This paper gener- 
alizes the scope of the Artificial Prediction Markets to regression, where 
there are uncountably many possible outcomes and the error is usually 
the MSE. For that, we introduce the reward kernel that rewards each par- 
ticipant based on its prediction error and we derive the price equations. 
Using two reward kernels we obtain two different learning rules, one of 
which is approximated using Hermite-Gauss quadrature. The market set- 
ting makes it easy to aggregate specialized regressors that only predict 
when an observation falls into their specialization domain. Experiments 
show that regression markets based on the two learning rules outperform 
Random Forest Regression on many UCI datasets and are rarely outper- 
formed. 

1 Introduction 

Prediction markets are forums of trade where contracts on the outcomes of fu- 
ture events are bought and sold. Each contract is a wager that yields payment 
if its corresponding outcome occurs. Each market participant has an incentive 
to profit and therefore an incentive to predict accurately. The trading prices of 
contracts are determined by supply and demand. Highly demanded contracts 
are more expensive and represent an overall confidence that a corresponding 
outcome will be realized. On the other hand, less demanded contracts are less 
expensive and represent an overall lack of confidence that a corresponding out- 
come will be realized. These trading prices can be interpreted as the market's 
prediction of the outcome. Studies have sh own that the t rading prices even es- 
timate the true probability of the outcome iManski <l2006h . Prediction markets 



have found use in predicting elections, deci sion making i n both government and 
business realms, and even sporting events Arrow et al. ( 2008|) . Their reported 



accurac y and success motivated the development of the Ar tificial Prediction 
Market [l^ (|2009t) ; lLav fc Barbul jioioh : iBarbu fc Lavl (|201ll ) that attempts to 



mimic a real prediction market in a machine learning setting. The Artificial Pre- 
diction Market has empirically proven to be a competitive classifier aggregation 
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techni que and motivates further investigation. It was proved in iBarbu fc Lav 
( 20111 ) that the Artificial Prediction Market learns by constrained Maximum 
Likelihood. 

In this paper we generalize the Artificial Prediction Market to regression. 
While the objective of classification is to predict a label from a finite set of labels, 
the objective of regression is to predict a real value response. We develop a 
mathematical analog of the Artificial Prediction Market, the Regression Market, 
to deal with real values, or uncountably many "labels". Regression markets 
are unusual in that contracts are no longer discrete and finite. Each contract 
corresponds to a real value prediction and consequently there are uncountably 
many such contracts for trade. While in classification a contract that has not 
predicted the correct outcome does not win anything, for regression we introduce 
the reward kernel that rewards contracts based on the distance to the ground 
truth value. 

We further sho w experiments on UCI Frank fc Asuncion ( 2010l) and LI- 
AAD Torgo ( 2010l ) data sets that demonstrate that the Regression Market is a 
viable technique for aggregating regressors, and also works very well with spe- 
cialized regressors that only predict outcomes for certain instances and not for 
other. 



2 Related Work 



To the best of our knowledge, there has been no other work on solving regression 
tasks with machine learni ng models of p r edicti o n markets. Re l ated w ork can be 
found for classification in Lav fe Barbu ( 2010h : Barbu fe Lav ( 2011 ) where Ar- 
tificial Prediction Markets were developed for classification using betting func- 
tions and an equilibrium based on conservation o f budget sum. 

Another model can be found in lStorkev ( 2011 ) where machine learning mar- 
kets a re instead derived from utility functions. 

In Chen fc Vaug hail (|2010l ) the authors find a connection between no-regret 
learning and prediction markets. 



3 Overview of the Artificial Prediction Markets 

In lLav fc Barbul(|2010h . the classification market is defined by a betting function 
(j) k (x,c) that describes the proportion of the budget (3 to allot for label k for a 
given instance x and trading prices for all labels c. The equilibrium price c is 
defined such that the for any label, the sum of profits equaled the sum of losses 

M K 

^A»5>^(x,c) y=l,2,...,K 

m=l k=l 



m— 1 M 
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This equilibrium system corresponds to the update rule for the classification 
market 

Pm <" Pm ~ Pm V <^(x, c) + Pr, f mM 
* — » r. 



k=l 



for m = 1,2, ...,M. This is the profit. With a little reworking, the above 
equilibrium is equivalent to solving the following fixed point problem 



M 

c fc = ^/3 m <^(x,c) k=l,2,...,K 

rn — l 



The tra ding price c is consi dered to be an estimate of the conditional mass. 
In fact, iBarbu fc Lavl ([201 1[ ) demonstrates that the classification market max- 
imizes log likelihood. 



4 Regression Markets 

The extension of prediction markets to the regression problem proves to be 
counterintuitive. In classification, the goal is to predict the one correct label 
for a given instance. What can be said about regression? Assume, for the 
time being that the classification market framework generalizes. For the sake of 
consistency with probability notation </>(y|x,c) will denote a betting functional 
that allots a proportion of the budget for response ygl. This implies that 

0<y <t>{y\x,c)dy < 1 (1) 

since no participant may bet more than the whole of their budget in this market. 
A curious consequence of this constraint is that it is possible for 0(y|x, c) > 1 
for some y. Likewise, the trading prices for y are denoted as the price function 
c(y|x). The trading price is a conditional density on the possible responses y. 
The prediction can be computed from, for example, expectation 

V = j y tc(t\x)dt (2) 

However, the price function can also model ambiguous responses. For example, 
points along a circle could result in a bimodal price function. 
The equilibrium price function c(y\x) receives similar treatment as the classifica- 
tion market. The objective is to find a c(y|x) that gives conservation of budget. 
The ambiguity of the correct label mentioned above is resolved by introducing 
a reward kernel K(t;y). The reward kernel is a density with a single mode 
centered about the ground truth y. The winnings are subsequently defined as 

f TSt \0(^l X 7 C ) /n\ 

winnings =/ K(t;y)———dt (3) 

JY ci^W 
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and bears similarity to the winnings in the classification market. This has the 
effect of partially rewarding participants for nearby predictions. Likewise, the 
total expenditures for contracts are given as 



bet 



J <f>(t\x,c)dt (4) 



Analogous to the classification market, the equilibrium price function c(t/|x) is 
defined such that gains match losses 



J2 Pm f K(t;y) * m ®*l C) dt = J2 f <t>m(t\*, c)dt (5) 

m— 1 ^ VI/ m— 1 ^ 

4.1 Constant Market for Regression 

For simplicity and the reported empirical performance of the constant classi- 
fication market, the remainder of this paper assumes </>(y|x, c) = h(y\x) where 
h(y\x) is a conditional density with mean /(x). Here f(x) is a regressor. This 
defines the constant market for regression with 

M 

c (y\ x ) = Prnh m {y\x) (6) 

m— 1 

„ M 
y= tc(t\*.)dt = Prnfrni*) (7) 



rn — l 



The update rule is similar to that of the classification market in exception to 
the additional reward kernel 

(3 m ^P m + vP m ( y J Y K(t;y)^^dt-?) (8) 

where rj is the learning rate and also serves to prevent instanaeous bankruptcy 
(i.e. = 0). The choice of K(t;y) gives different update rules. We exam- 
ine K(t;y) = S(t — y) where S(t) is the Dirac delta function and K{t;y) = 



/2¥ct 



4.2 Delta Updates 

When K (i; y) — 5{t — y) this gives an analogous update rule as the classification 
market 

'h m {y\x) 



Even though this reward kernel is exacting, it will be shown empirically to work 
relatively well. 
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4.3 Gaussian Updates 

When K(t; y) — -^=-e ^ , this gives an update involving an integral 

An <- P m + vPm ( r ^^^^iirv dt A ( 10 ) 

\J-oo V2TTCT c(f|x) J 



One w ay to approximate this integral is with Hermite- Gauss quadrature Press! 



( 2007) . A change of variables is required to apply the quadrature rule 



, e ST 2 — . . . at (11) 

72^^ c(t|x) 

lf ( A(»+ft) a (12) 

a;,-, ^ : ^loj 



1 " 



! = 1 



where a;,, ij are the n-point Hermite-Gauss weights and nodal points. 
Intuitively, the choice of a should reflect the noise variance of the training data 
(assuming Gaussian noise). If a is too small, the market is more prone to 
ovcrfitting. This a can be chosen with cross validation by discretizing a £ (0, 1] 



and trying a — a\J Sn=l Vn (assuming the noise has mean 0). 
4.4 Specialized Regression Markets 

Introduced in lLav fc Barbu ( 2010l ). specialized markets are markets with par- 



ticipants which have local support in the feature space. This type of participant 
is assumed to perform relatively well in its domain. An example of a specialized 
market is a market with random tree leaves as participants. These types of 
markets have been demonstrated to be competitive with random forest. The 
specialized regression market of tree leaves is similar except that leaves are 
Gaussian instead of histograms. Each regression tree stores the sample mean y 
and variance a 2 of instances that fall in each leaf. 



5 Results 

We performed two types of experiments with b oth updat es ©, (fTOj) and com 



pared with Breiman's original regression resul ts Breiman (2001) as well as ad- 
ditional data sets from UCI and LIAAD iTored (l201ol T To be consistent with 



Breiman, nearly all experiments were conducted over 100 random splits where 
each split randomly sets aside 10% of the data set for testing. For abalone, only 
10 random splits with 25% of the data set aside for testing were considered. 
Data sets with provided test sets were not randomly split. Instead, the forest 
and markets were trained 100 times on the entire training set and tested on the 
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Table 1: Table of MSE for forests and markets on UCI and LIAAD data sets. 
The F column is the number of inputs, Y is the range of regression, RFB is 
Breiman's reported error, RF is our forest implementation, DM is the Mar- 
ket with delta updates, and GM is the Market with Gaussian updates. Bul- 
lets/daggers represent pairwise significantly better/worse than RF while +/- 



represent significantly better/worse than RFB. 



Data 


"train 


"test 


F 


Y 


RFB 


RF 


DM 


GM 


abalonc 


4177 




a 


[1.00, 29-00J 


4.600 


4.571 


4.571 


4.571 


fricdmanl 


200 


2000 


10 


[4.30, 26.03] 


5.700 


4.343+ 


4.335.+ 


4.193« + 


fricdman2 


200 


2000 


4 


[-167.99, 1633.87] 


19600.0 


19431.852 


19232.482* 


18369. 546» + 


f'ricdman3 


200 


2000 


4 


[0.13, 1.73] 


0.022 


0.028- 


0.028»- 


0.026»- 


housing 


506 




13 


[5.00, 50.00] 


10.200 


10.471 


10.130» 


10.128» 




330 




8 


[1.00, 38.00] 


16.300 


16.916 


16.925 


16.917 




167 




4 


[0.13, 7.10] 


0.246 


0.336 


0.295 


0.322 


ailerons 


7154 


6596 


40 


[-0.00, -0.00] 




2.814e-008 


2.814e-008» 


2.814e-008« 


auto-mpg 


392 




7 


[9.00, 46.60] 




6.469 


6.444 


6.405. 




159 




15 


[5118.00, 35056.00] 




3823550.43 


3723413.430 


3815863.98 




4500 


3693 


32 


[0.00, 0.67] 




7.238e-003 


7.212e-003» 


7.210e-003» 


breast cancer 


194 




32 


[1.00, 125.00] 




1112.270 


1112.509 


1108.325 


cart example 


40768 




10 


[-12.69, 12.20] 




1.233 


1.233f 


1.232» 


computer activity 


8192 




21 


[0.00, 99.00] 




5.414 


5.398» 


5.414f 


diabetes 


43 




2 


[3.00, 6.60] 




0.415 


0.426f 


0.415 




8752 


7847 


18 


[0.01, 0.08] 




9.319e-006 


9.288e-006» 


9.225e-006» 


forestfircs 


517 




12 


[0.00, 1090.84] 




5834.819 


5844. 493t 


5680. 131» 


kinematics 


8192 




8 


[0.04, 1.46] 




0.013 


0.013» 


0.013» 




209 




6 


[6.00, 1150.00] 




3154.521 


2991. 798» 


3042.336 


poletelccomm 


5000 


10000 


48 


[0.00, 100.00] 




29.813 


28.855» 


29.863T 




4499 


3693 


32 


[-0.09, 0.09] 




9.237e-005 


8.917e-005» 


8.888c-005» 


pyrimidines 


74 




27 


[0.10, 0.90] 




0.013 


0.013 


0.012 


triazincs 


186 




60 


[0.10, 0.90] 




0.015 


0.015 


0.015 



provided test set. These results vary due to the randomness of the regression 
forest. 

All experiments were run on Windows 7 with 8GB of RAM and Core i7-2630QM 
process (max 2.9GHz, 6MB L3 cache). On each training set 100 regression trees 
were trained. Each regression tree node considered 25 randomized features, 
each a linear combination of 2 random inputs. Each coefficient of the linear 
combination was uniformly picked from [—1,1]. In our implementation, 1000 
of these random features were generated in advance rather than at each node. 
The split criteria for each node is based on the weighted sample variance. The 
rule "don't split if the sample size is < 5" was enforced. Additionally, our im- 
plementation treats categoricals as numeric inputs which differs from Breiman's 
implementation. However, most data sets are comprised of numeric inputs. 
Both market types were trained and evaluated over 50 epochs. Each epoch is 
one complete pass through the training set. The reported errors are those that 
minimize the MSE of the test set over the 50 epochs (averaged over the 100 
runs). 

1 N 

MSE^^f/fxJ-y 2 (14) 



N 

n=l 



The learning rate r\ — N was used as in iBarbu fc Lavl ([20111 ) . On the first 
run (random split or full training set), the parameter a for the Gaussian Market 
reward kernel was estimated using 2-fold cross validation on the training set. 
This a remained constant for the other 99 runs (9 runs for abalonc). The 
Gaussian market used 5-point Hcrmitc Gauss quadrature. The prediction for y 
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was computed with expectation 



tc{t\x)dt 



M 

E 

m— 1 



(15) 



In every result, signifi cance is measu red with significance level a = 0.01 in two 
ways: pairwise t-test Demsai (2006;) and t-test on the means. The pairwise 
t-test was used to compare the 100 market runs with the 100 forest runs while 
the t-test on the means were compared with Brciman's reported results. 



5.1 Comparison with Random Forest Regression 

The first experiment considers aggregation of tree leaves of forests with fully 
grown trees on UCI and LIAAD data sets. The results of seven of the data 
sets are compared with Breiman's reported results. The missing data set Robot 
Arm is private. 

From [5] our RF doesn't perform identically with RFB. This can be attributed 
to the synthetic nature of some data sets such as friedmanl, friedman2, and 
friedman3 and/or the fact that our implementation of regression forest does not 
treat categorical inputs the same way. Of the Breiman comparisons, only GM 
is legitimately significantly better than Breiman's results for friedman2. Out 
of all the data sets, DM is significantly better than RF for 12 data sets (in 
a pairwise sense) while GM is only significantly better than RF for 11 data 
sets. However, DM is significantly worse than RF for 3 data sets while GM is 
only significantly worse on 2 data sets. The significantly worse results can be 
attributed to overfitting and/or poorly tuned reward kernel in the case of GM. 



Table 2: Table of MSE for depth 5 forests and markets on UCI and LIAAD 
data sets. The F column is the number of inputs, Y is the range of regression, 
RFB is Breiman's reported error (these errors are from fully grown trees), RF 
is our forest implementation, DM is the Market with delta updates, and GM 
is the Market with Gaussian updates, and Speedup is the speedup factor of a 
depth 5 tree versus a depth 10 tree for evaluation. Bullets/daggers represent 
pairwise significantly better/worse than RF while +/- represent significantly 



better/ worse than RFB. 



Data 




JVtest 


F 


Y 


RFB 


RF 


DM 


GM 


Speedup 


abalonc 


4177 




8 


[1.00, 29.00J 


4.600 


4.438 


4.318* + 


4.438 


3.3 


friedmanl 


200 


2000 


10 


[4.30, 26.03] 


5.700 


5.076+ 


4.701* + 


4.429* + 


1.8 


fricdman2 


200 


2000 


4 


[-167.99, 1633.87] 


19600.0 


29343.562- 


23200. 438*- 


21183.421*- 


1.9 


t'ricdman3 


200 


2000 


4 


[0.13, 1.73] 


0.022 


0.034- 


0.029*- 


0.028*- 


2.0 




506 




13 


[5.00, 50.00] 


10.200 


12.869- 


12.056*- 


11.947*- 


2.2 




330 




8 


[1.00, 38.00] 


16.300 


16.976 


16.964 


16.932 


2.1 




167 




4 


[0.13, 7.10] 


0.246 


0.248 


0.241 


0.254 


1.6 


auto-mpg 


392 




7 


[9.00, 46.60] 




8.248 


7.817* 


7.750* 


2.1 




159 




15 


[5118.00, 35056.00] 




4699789.7 


4524741.81 


4431992.3 


1.4 




194 




32 


[1.00, 125.00] 




1073.319 


1071.820 


1072.126 


2.1 


diabetes 


43 




2 


[3.00, 6.60] 




0.400 


0.426f 


0.393 


0.7 




517 




12 


[0.00, 1090.84] 




4945.630 


5445. 001f 


5196.4511 


2.2 


machine 


209 




6 


[6.00, 1150.00] 




3137.001 


3127.932 


2930.506 


1.8 


triazincs 


186 




60 


[0.10, 0.90] 




0.016 


0.015* 


0.015* 


2.0 
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5.2 Fast Regression using Shallow Trees 



This experiment examined the aggregation capabilities of the regression market 
with shallow trees. In many problems, it is prohibitively expensive to train and 
even evaluate deep trees. In practice this is mitiga ted b y enforcing a maximum 
tree d epth. For example in Criminisi et al. ( 201ll ) and R Girshick fc Criminisi 
( 20111 ) the regression trees were constrained to depth 7. However, this strict 
constraint on tree depth is prone to introduce leaves that do not generalize well 
due to prematurely halting tree growth. The specialized regression market of 
tree leaves can be used to weight the leaves. Poorly performing leaves will tend 
to have less weight thus improving the overall prediction accuracy. 
In addition to the previously mentioned experiment details, regression trees were 
grown with a maximum depth of 10. Using the same depth 10 trees, MSE errors 
were computed for leaves no deeper than depth 5. Both depth 5 and depth 10 
evaluations for training and test sets were recorded. The timings for the larger of 
the two sets were averaged over the 100 runs and used to compute the speedup. 
The markets were applied to the depth 5 leaves only. Since the market is just a 
linear aggregation of 100 leaves per instance, the reported speedup for forest is 
similar to the speedup of the market. 

From 15.11 it can be seen that the depth 5 forest is roughly twice the speed of the 
depth 10 forest. On diabetes, the small data set, features and forest likely fit 
in cache giving the strange 0.7 speedup. DM performs significantly better than 
RF on seven data sets (in a pairwise set) while DM only performs significantly 
better on six data sets. However, DM performs significantly worse on two data 
sets while GM performs significantly worse on one. No method legitimately 
performs significantly better than RFB since RF is already better than RFB 
on those two data sets. The significantly worse results can be attributed to 
ovcrfitting and/or poorly tuned reward kernel in the case of GM. 



6 Conclusion 

This work presented a generalization of the Artificial Prediction Markets from 
classification to regression with uncountably many outcomes. It introduced two 
types of update rules and demonstrated their learning ability through experi- 
ments on UCI and LIAAD datasets. Furthermore, it showed the capability of 
the regression market to aggregate shallow tree leaves into much better regres- 
sors than those obtained by voting. In future work we plan to use the market 
for regression with non-uniform noise levels and multi-modal conditional prob- 
abilities p(?/|x). 
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